Clustering Semantically Similar and Related Questions
نویسنده
چکیده
1 ABSTRACT The success of online question answering communities that allow humans to answer questions posed by other humans has opened up a whole new set of search, browse and clustering problems. One of the important problems arises from the need to show similar and related questions for a particular “probe” question. The first exercise is to define a measure for similarity and relatedness that leads to showing interesting results to the user, assuming that the user is interested in the probe question. However, this exercise reveals the inherent ambiguity of defining relatedness for this problem. The solution proposed takes this ambiguity into account and does not rely on a fixed similarity measure to filter questions and rerank them. Instead the approach uses a two step method to show relevant questions for the probe. The first step involves identifying the main topic of the question using which the base set of questions having this same topic is constructed. This base set is then clustered taking into account the lexical and semantic similarity between the questions. The hypothesis is that each of the identified clusters defines a type of relatedness with the probe – one of these types is the identity relation which encompasses paraphrases and very similar questions. For evaluating the proposed technique, the results were evaluated manually and it was noted that for 90% of the cases, the clustering technique is effective and the results displayed in this manner seem appealing.
منابع مشابه
The Effect of Teaching Vocabulary through Synonymous, Semantically Unrelated, and Hyponym Sets on EFL Learners’ Retention
Many textbooks include semantically related words and sometimes teachers add synonyms, antonyms, etc. to the words in order to present new vocabulary items without questioning the possible effects. This study sought to investigate the effect of teaching vocabulary through synonym, semantically unrelated, and hyponym sets based on Higa’s (1963) proposed continuum. A total of 120 Iranian intermed...
متن کاملEvolutionary optimization for ranking how-to questions based on user-generated contents
In this work, a new evolutionary model is proposed for ranking answers to non-factoid (how-to) questions in community question-answering platforms. The approach combines evolutionary computation techniques and clustering methods to effectively rate best answers from web-based user-generated contents, so as to generate new rankings of answers. Discovered clusters contain semantically related tri...
متن کاملThe Impact of Semantic Clustering on Iranian EFL Advanced Learners’ Vocabulary Retention
This study investigated the impact of semantic clustering on Iranian EFL learners’ vocabulary retention at advanced level. Participants were female learners randomly assigned to two groups of 15. Four instruments (TOEFL test; vocabulary pretest; immediate posttest, and delayed recall posttest) were used. The experimental group underwent semantic clustering vocabulary presentation in which the l...
متن کاملClustering similar nouns for selecting related news articles
In both written language and spoken language, we sometimes use different words in order to express the same meaning. For instance, we use “candidacy” and “running in an election” as the same meaning. This makes text classification and event tracking difficult. To do this, we have to identify the words which are semantically similar to each other accurately. In this paper, we propose a method to...
متن کاملTypology of Similar Verses in “Tafsir al-Mizan” Focusing on Shafa’at (Intercession) in Surah al-Baqarah
Among the verses of the holy Quran which are all illumination, some verses are remarkably prominent, enlighten other verses, and closely correspond to other verses. Understanding such verses is only possible by making use of the Quran-through-Quran interpretation method. “Tafsir al-Mizan”, which is one of the most efficient interpretations of the present age, has managed to greatly illuminate t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007